SCRY: Extending SPARQL with Custom Data Processing Methods for the Life Sciences

نویسندگان

  • Bas Stringer
  • Albert Meroño-Peñuela
  • Sanne Abeln
  • Frank van Harmelen
  • Jaap Heringa
چکیده

An ever-growing amount of life science databases are (partially) exposed as RDF graphs (e.g. UniProt, TCGA, DisGeNET, Human Protein Atlas), complementing traditional methods to disseminate biodata. The SPARQL query language provides a powerful tool to rapidly retrieve and integrate this data. However, the inability to incorporate custom data processing methods in SPARQL queries inhibits its application in many life science use cases. It should take far less effort to integrate data processing methods, such as BLAST, with SPARQL. We propose an effective framework for extending SPARQL with custom methods should fulfill four key requirements: generality, reusability, interoperability and scalability. We present SCRY, the SPARQL compatible service layer, which provides custom data processing within SPARQL queries. SCRY is a lightweight SPARQL endpoint that interprets parts of basic graph patterns as input for user defined procedures, generating an RDF graph against which the query is resolved on-demand. SCRY’s federationoriented design allows for easy integration with existing endpoints, extending SPARQL’s functionality to include custom data processing methods in a decoupled, standards-compliant, tool independent manner. We demonstrate the power of this approach by performing statistical analysis of a benchmark, and by incorporating BLAST in a query which simultaneously finds the tissues expressing Hemoglobin β and its homologs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SCRY: Enabling Quantitative Reasoning in SPARQL Queries

The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences. SCRY, our SPARQL compatible service layer, improves this by executing services at query time and making their outputs query-accessible, generating RDF data on demand. The power of this approach is demonstrated with two use cases, where we use SCRY to calc...

متن کامل

To SCRY Linked Data: Extending SPARQL the Easy Way

Scientific communities are increasingly publishing datasets on the Web following the Linked Data principles, storing RDF graphs in triplestores and making them available for querying through SPARQL. However, solving domain-specific problems often relies on information that cannot be included in such triplestores. For example, it is virtually impossible to foresee, and precompute, all statistica...

متن کامل

FedViz: A Visual Interface for SPARQL Queries Formulation and Execution

Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Over the last decade the deluge of health care life science data and the standardisation of linked data technologies resulted in publishing datasets of great importance. This emerged as an opportunity to explore new ways of bio-medical discovery through s...

متن کامل

BioFed: federated query processing over life sciences linked open data

BACKGROUND Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested ...

متن کامل

Federated Query Formulation and Processing through BioFed

A single interface for accessing life sciences (LS) data is a natural need to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. This paper demonstrates BioFed, a federated SPARQL query processing system customised for LS-LOD. BioFed enables user to formulate a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016